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Abstract 

Random fields play a central role in the analysis of spatially correlated data and, as a result, 
have a significant impact on a broad array of scientific applications. Given the importance of 
this topic, there has been a substantial amount of research devoted to this area. However, the 
cepstral random field model remains largely underdeveloped outside the engineering literature. 
We provide a comprehensive treatment of the asymptotic theory for two-dimensional random 
field models. In particular, we provide recursive formulas that connect the spatial cepstral 
coefficients to an equivalent moving-average random field, which facilitates easy computation of 
the necessary autocovariance matrix. Additionally, we establish asymptotic consistency results 
for Bayesian, maximum likelihood, and quasi-maximum likelihood estimation of random field 
parameters and regression parameters. Further, in both the maximum and quasi-maximum 
likelihood frameworks we derive the asymptotic distribution of our estimator. The theoretical 
results are presented generally and are of independent interest, pertaining to a wide class of 
random field models. The results for the cepstral model facilitate model-building: because the 
cepstral coefficients are unconstrained in practice, numerical optimization is greatly simplified, 
and we are always guaranteed a positive definite covariance matrix. We show that inference for 
individual coefficients is possible, and one can refine models in a disciplined manner. 

Keywords: Bayesian estimation; Cepstrum; Exponential spectral representation; Lattice data; 
Spatial statistics; Spectral density. 
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1 Introduction 



Spatial data feature heavily in many scientific disciplines including ec ology, environm ental science, 
epidemiology, geography, geology and socio-demographics. Following ICressid (j 19931 ). spatial data 
can be broadly placed into three categories: geostatistical data, lattice data, and spatial patterns. 
Here, our focus mainly resides in the development of cepstral random field models for spatial lattice 
data. That is, we consider random fields where the index set for the variables is Z x Z - appropriate 

for image processing, for example. i i 

Research on spatial random fields dates back over 



Besag and Green 


( 


1993 


), anc 


found in 


Cressie 


( 


1993Ustein 



l alf a c entur y, e.g., s e e (IWhittle . 1954) 



Besag 



(|197l 119741 ). iGuvonl $1982 ). 



Rosenblatt 



19991 ) . Baneriee et alJ (|2004h . ICressie and Wiklel (|201lh . and 



the references therein. 

For a stationary Ga ussian random field, it is natural to impose a Markov structure, as described 



m 



Rue and Heidi (|2010l ). in order to obtain an inverse covariance matrix (i.e., a precision matrix) 



that has a sparse structu r e, be cause this will ensure speedy computation of maximum likelihood 



estimates. 



Rue and Heidi (|2010l ) show how careful specification of conditional distributions gener- 



ates a well-defined random field. However, this technique relies upon imposing a priori a sparse 
structure on the precision matrix, i.e., demanding that many conditional precisions be zero. In 
contrast, the cepstral random field does not generate a sparse covariance (or precision) matrix, 
and yet always yields a well-defined spatial random field. It is actually a general way of specifying 
a non-isotropic random field when we do not have prior notions about conditional variances or 
precisions, and in particular for fields that are not Markovian. 

The cepstral random field allows for unconstrained optimization of the objective function, i.e., 
each model coefficient can be any real number independently of the others; this appealing property 
is in marked contrast to other models and approaches, such as moving averages or Markov random 
fields (these require constraints on parameters to achieve identifiability and/or a well -defined pro - 
cess). Despite these advantages, the cepstral random field mo del, first de veloped by ISolol (|1986l ). 
has only played a minor role. In the development of this model, ISolol (]1986l ) presents estimation ap- 
proaches by both log periodogram regression and Whittle maximum likelihood. Additionally, based 
on information criterion, Mallow C p and hypoth esis testing, the author briefly describes methods for 
model selection. Broadly speaking, Solo ( 19861 ) presents an approach to modeling cepstral random 
fields, based on regression and Whittle maximum likelihood, and leaves the asymptotic properties 
of the estimators untreated. 

This paper provides a different approach to model fitting and develops the first comprehensive 
treatment of the theory for cepstral random field models. In particular, we establish recursive 
formulas for connecting cepstral random fields to moving average random fields, thus facilitating 
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efficient computation of the spatial autocovariances. Critically, the resulting autocovariance matrix 
is guaranteed to be postive-definite; note that if we were to work with a moving average (MA) field 
instead, it would not be identifiable without imposing further complicated parameter restrictions. 
Thus, given these autocovariances, the exact likelihood function for the cepstral random field model 
can be easily expressed as a function of the unconstrained parameters. 

Additionally, we develop asymptotic results for Bayesian, maximum likelihood, and quasi- 
maximum likelihood estimation of field parameters and regression parameters. In particular, we 
establish asymptotic consistency in both the Bayesian and likelihood settings and provide Central 
Limit Theorems for the frequentist estimators we propose. We discuss the computational advan- 
tages of the cepstral model, and propose an exact Whittle likelihood that avoids the burdensome 
inversion of the autocovariance matrix. Although our primary focus is on cepstral models, the 

s with regression effec t s. Ou r 



Mardia and Marshall 



41984 ). 



theoretical developments are presented for general random field mode 
results are of independent interest and extend the existing results of 
providing a rigorous framework for conducting model building and inference. 

As discussed in Sections [5] and El the proposed cepstral models are computationally advanta- 
geous over many current models (e.g., spatial autoregressive models), because no constraints need 
to be imposed on the parameters to insure the resulting autocovariance matrix remains positive 
definite. In fact, given the recursive formulas of Section [2 one can model the two-dimensional 
cepstral coefficients (i.e., the Fourier coefficients of the two-dimensional log spectrum) and arrive 
at the autocovariances without the need for dir ect Fourier inversion. 

Since the model's first inception (jSolol . I1986I ). the cepstral random field li terature has remained 



sparse, with relatively few examples to date. For example, Cressiej (|1993l . p. 448) makes brief 
mention of the model. In a dif ferent context. iNoh and Sold (|2007l ) use cepstral random fields to 



test for space-time separability. ISandgren and Stoical (|2006l ) use two-dimensional cepstrum thresh- 
olding models to estimate the two-dimensio nal spectral density. However , this work doesn't treat 
the random field case. Related to our work, iKizilkava and Kavranl ([2005 ) d erive an a l gorith m for 
computing cepstral coefficients from a known ARMA random field whereas IKizilkava! (|2007l ) pro- 
vides a recursive formula for obtaining a nonsymmetric half plane MA random field models for a 
given cepstral specification. In contrast, our recursive formulas provide unrestricted MA random 
fields as well as the necessary autocovariances for expressing the Gaussian likelihood. 

This paper proceeds as follows. Section [2] describes the cepstral model and its computation. 
Specifically, this section lays out the recursive formulas that are needed to estimate the autoco- 
variances given the cepstral coefficients. Section details the different model fitting methods, 
including Bayesian, maximum likelihood, quasi-maximum likelihood, and exact Whittle likelihood. 
In addition, this section describes extensions to missing data, imputation, and signal extraction. 
Our theoretical results are provided in Section [H Here, we establish consistency and asymptotic 
normality of the proposed estimators. Section [5] contains concluding discussion. For convenience 
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of exposition all proofs are left to the Appendix. 



2 The Cepstral Model and its Computation 



We begin by introducing some basic concepts about spatial random fields, and then we specialize 
to the cepstral rando m field, w i th a f o cus on compu t ation of autocovariances. Refer e nces on spatial 
random fields include 



Whittle (195 



(119931 ) . iKedem and Fokianosl (|2002T ). and 



|), iBesad (119721 . iRosenblattl (|l985L l200d ). ISolol (|1986T ). 



Rue and Held ( 



Cressie 



2010)- A random field Y = {Y r , s } is a 



process with indices on a lattice, which in this paper we take to be Z x Z. Typically a random field 
has a mean function = EY r>s , which may be modeled through regression variables (Cressie, 
1993). The mean-corrected field Y — {{i r ,s} will be denoted by W. 

Interest focuses upon stationary random fields - which in practice is often adequate once mean 
effects are identified and accounted for - whose joint distribution functions ar e shift inyariant 
When all moments are defined, this is equivalent to the higher-order cumulants (jBrillinger 



2001 



being dependent only on lags between the mean-centered variables. The second-order cumulant 
function, or autocovariance function (acf), is defined via Cov(Y riS , Y fli &) = E[W r)S W a) ;,] = 7 a _ r . i &_ s 
for all a, b,r,s 6 Z. It is convenient to summarize this second-order structure through the spectral 
density F defined on [— ir, tt] 2 , which depends on two frequencies. Letting Zj = e~ lXj for j = 1,2, 
the spectral density is related to the acf via the formula 



F(Ai,A 2 )= Yl lh,k( F ) z i z l 

h,k&Z 



(1) 



Here we write j(F) for the acf associated with the spectrum F, and it in turn is expressed in terms 
of F via Fourier inversion as 



lh,k( F ) 



£ £ F(X 1 ,X 2 )Z^ h Z- k dX, 



id\ 2 . 



(2) 



As a general notation, let the normalized double integral over both frequencies be abbreviated by 
the expression < • >, so that ^h,k{F) = < FZ^ Z% > is compactly expressed. Now it follows 
elementarily from the commutativity of the field Y variables that jh,k = 7-h,-k, and hence the 
corresponding F in (pQ) must have mirror reflectional symmetry through both ax es, i.e., F{X-\ , A 2) = 
F(— Ai, — A2). Furthermore, the acf of a rand om field i s alwa ys positive-definite (jCressid . Il993l ) and 
the corresponding spectrum is non- negative (jBochnerl . Il955l ). 

With these preliminaries, let us proceed to consider models of lattice data. A spatial model 
for continuous-valued random variables should - at a minimum - capture second order structure 
in the data, which is summarized through the acf. However, a putative acf may or may not have 
non- negative Fourier Transform (FT) ([1]), whereas any valid acf of a stationary field must have 
non-negative spectrum F. One way to ensure our model has such a valid acf is to model F - 
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utilizing some class of non-negative functions - and determine the correspond ing covariances vi a 
([2]). This is the philosophy behind the versatile exponential time series model of lBloomfieldl (| 19731 ). 
The idea there was to expand the log spectrum in the complex exponential basis functions, with a 
truncation of the expansion corresponding to a postulated mode l. 



The same idea is readily adapted to the spatial context; ISolol (j 19861 ) seems to be the first formal 



presentation of this idea. So expanding log F in each frequency concurrently yields 



The coefficie nts {0 



treatment of 



logF(A 1 ,A 2 )= ]T Q hk Z{Zl 

j,kez 

< log FZ 1 3 Z^ k > } are called the cepstral coefficients; see also the recent 



Kizilkava and Kavranl ( 20051 ). A pleasing feature of this representation is that F 1 



has cepstral coefficients {— 0j,fc}- By truncating the summation, we obtain a parametric model 
that can approximate the second-order structure of any random field with bounded spectrum. So 
we obtain the cepstral model of order (p, q) given by 

( V Q 

F(X 1 ,X 2 ) = exp\ Yl E ®i,k Z i Z 2 



> . 



(3) 



j=-pk=-q 

Note that the cepstral coefficient Oo,o has no sinusoidal function multiplying it, and hence exp @o,o 
quantifies the scale of the data. In one dimension, this would be called the innovation variance; 
note that 0o,o = < log-F >. Because the complex exponentials form a complete orthonormal basis 
set, it is impossible for two distinct values of to produce an identical function F; hence the model 
is identifiable. 



Further special cases of the general cepstral field model are considered in ISold (119861 ). Because 
F has mirror reflectional symmetry, the cepstral coefficients do as well, i.e., @j t k = ©— j,—k- We 
might consider a model completely generated by the positive orthant of the matrix, so that 
&jk = ®—j,k = ®i— k = 0— j— k- Or we might have a positive quadrant model, where 0,& is 
only nonzero when j, k > 0. Zeroing out other batches of coefficients produces special cases; e.g., 
a separable random field is obtained by imposing that only the axes of are non- z ero. A lso an 
approximately isotropic random field structure can be imposed, as described in ISolol (j 19861 ). 

In order to fit this model to Gaussian data, it is necessary to compute the acf from a given 
specification of cepstral coefficients. We next describe two approache s to t h is: on e is approximate, 
and the other is exact. Both differ from the fitting techniques in ISolol (|1986l ). who advocates 
an asymptotic likelihood (or Whittle) calculation. The first approach involves a straightforward 
discretization of ([2]) - together with ([3]) - utilizing the Riemann approximation. So long as the 
spectrum is a bounded function, this method is arbitrarily accurate (since the practitioner controls 
the mesh size). In order to accomplish the computation, without loss of generality let q = p, so 
that the cepstral coefficients are given by a (2p + 1) x (2p + 1) grid (if q < p, just fill in some 
entries of with zeroes). 
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Now we refer to the entries of via @j,k with — p < j,k < p, which is a Cartesian mode of 
indexing; this differs from the style of indexing pertinent to matrices. We can map this grid to a 
matrix [0] (and back), with the following rule: 

[©life = ®k-p-l,p+l-j ©mn = [0]p + i_ n)m+ p + i (4) 

for 1 < j,k < 2p + 1 and —p < m,n < p. We will consider a set of frequencies {jTr/M,kn/M} 
for —M < j,k < M, which is an order M discretization of [— it, it] 2 . Consider a complex-valued 
2p + l x2M+l matrix E with entries E jk = exp{m(p+l-j)(M-k+l)/M} for j = 1,2, ■ ■ ■ ,2p+l 
and k = 1, 2, • • ■ , 2M + 1. Then 

for 1 < r, s < 2M + 1 holds (proved in the Appendix). So we can evaluate F on the desired grid of 
frequencies by exponentiating each entry of the matrix E'\Q]E, according to ([5|). 

Next, consider the grid of autocovariances given by V = {lh,k}^k=-H ^ or some maximal lag H. 
Discretizing equation ([2]) yields 

' M + l-s M + l-r\ f ., M + l-sl f ., M + l-r 



r,s=l 
2A/+1 



7/, t ~ (2M + 1) 2 F { 7r , 7r ) exp < trih > exp < irik 

' ' K ' ^ \ M M / I M I I M 



(2M + 1)- 2 exp{^[e]^J M }G fc+ff+M G ff+ i_ Jfe>r , 

r,s=l 

£T + l-j)(M + l-k) 

w 



G jk = exp{7T« — } 



for 1 < j < 2H + 1 and 1 < k < 2M + 1. Here G is a 2H + 1 x 2M + 1 dimensional matrix similar 
to E. For applications, it is convenient to compute [r] directly as follows: 

2M+1 

[r], m = 7 m -M+w~(2M + r 2 Yl ^p{(e'[Q}e) }G ms G er , 



or in matrix form is given by 



[r] « (2M + l)" 2 Gexp{ J E / [0] J B}G'. (6) 



In this formula we have written the exponential of a matrix, which here is not the "matrix expo- 
nential" , but rather just consists of exponentiating each entry of the matrix (this is clear from the 
antecedent formula). So © produces an arbitrarily fine approximation to the acf (taking M as 
large as desired). The algorithm takes a given 0, produces [0] via @, computes E and G (ahead 
of time, as they do not depend upon the parameters), and determines [r] via ©. 



Now we present the exact method for computing t 



is similar to that of Section 3 of 



l e acf from the cepstral matrix. Our approach 



Kizilkava and Kavran 



(|2005l ). though with one important difference. 
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They present an algorithm for computing cepstral coefficients from known coefficients of an ARMA 
random field. Instead, we take the cepstral coefficients as given, compute coefficients of certain 
corresponding MA random fields, and from there obtain the acf. In order to fit the Gaussian 
likelihood, we need to compute the acf from the cepstral matrix, not the reverse. 

First, consider decomposing the cepstral matrix into four quadrants, plus the four semi-axes, 
plus the origin (Kizilkaya (2007) has a similar decomposition). In the following treatment, we can 
let p = oo for a general decomposition, or take p < oo for the purposes of modeling: 



E ®i,kZiz% = e ,o + E @ ^ z i z 2 + E z i J/ -i '' (t) 
j,k=~p j,k=i j,k=i 

j,k=i j,k=i 
+ ^e J , o ^ + E o. fcZ 2- 

jytO k^O 

We have used some of the simple reflection properties of the cepstral matrix for this decomposition. 
Also terms are grouped appropriately. We introduce the device of a "causal" field and a "skew" 
field as follows. The causal field is a MA field that only involves coefficients with indices in the 
positive quadrant, whereas the skew field essentially is defined over the second quadrant. More 
precisely, we have 



m,n>0 



(8) 



E ^,k Z l Z 2 



for the causal field. The causal field may be written formally (in terms of backshift operators 
Bi,B 2 ) as ^(Bi,B 2 ) = J2j k>o ^3,^1^2 • That is, the ipj^ coefficients define the moving average 
representation of the causal field, and {jr,s(^)} is its acf. It is important that we set ^0,0 = 1- 
Similarly, let <$>(Bi, B2) = fc>0 (^j^B^ 3 B% for the skew- field, which in the first index depends on 
the forward shift operator B^ , but on the backshift operator B2 in the second index. Also 



E M ' Z " 



7r,»($) = E 
m,n>0 



(9) 



j,k>0 



We also have two one-dimensional random fields, corresponding to the axes of the cepstral matrix, 
given by E(Bi) = Y,k>o€k B i and Q(B 2 ) = J2k>o u kZ 2 , which have acfs -y h (E) = J2k>o^k+h and 
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7/i(f2) = X^fc>o u k^k+h respectively. Now each of these MA random fields has a natural cepstral 
representation, each of which can be made to correspond to a term of ([7]): 



^j,kZizl = exp{ j2 Qj,kZ{Z*} 
j,k>0 j,k=l 

p 

hkZ^ J Zl = exp{ J2 ®~j, k Zi j Z$} 

j,k>0 j,k=l 

j>0 j=l 
P 

Y u i Z 2 =exp{^e 0li ^}. 



(10) 

(11) 
(12) 

(13) 



By expanding the exponentials as power series on the right hand side of (|10|) , we see that Z\ and Z 2 
both occur at least once in every term; hence we must have "00,0 = 1, "0o,i = 0, and ^1,0 = 0. The 
other coefficients V^fc have no constraints, but are determined by the cepstral coefficients. Similarly 
0o,o = 1, ^0,1 = 0, and ljO = 0, while £ = 1 = w . 

Comparing these associations with (|7|), we see at once that the spectrum is equal to the magni- 
tude squared of the product of the above factors (fTUj) , (fTTj) , (fT2"j) , and (fT3"j) , along with the constant 
e ©o,o _ Taking the product of the corresponding squared magnitudes yields 



r,s,a,b,m,ni 



e eo,o J- £ -M*) 



Y lj+a-m,k-b-n{^)lm{^)ln{^) 



Z{Z. 



j ryk 



Matching coefficients (i.e., computing the inverse FT) then produces the acf of the cepstral model: 



Y 7j+a-m,fc-6-nW7m(S)7n(^) 



(14) 



This produces the model acf in terms of the intermediary acfs 7(^), 7(^), 7(^) 5 arid 7(^2). To 
make the algorithm complete, we must obtain the field coefficients in terms of known cepstral 
matrix coefficients. Differentiating fjlOf) with respect to Z\ yields 



Y ^j,kjZ{ X Z\ = J2 ( J2 i>l-m,h-n®m,nm ] Z^Z, 
j,k>0 



h,l>l \m,n>l 



as a formal expression, after recollecting terms. Then matching terms, we find a recursive relation 
between ijjj^ (where both j and k are positive) and m , n , given below. Similar calculations for the 



S 



other associations (fTTj) . (|T2|) . and (|T3j) yield 

1 P / fc \ 

V'j.Ai = ~ ^2 771 [ ^2 ^j-m,k-n®m,n I (15) 
J m=l Vn=l / 
1 P / fe \ 

^J.fc = ~ X] 171 I X] ^-m,fc-n6-m.,n ) (16) 
3 m=l \n=l / 

1 P 

£j = ~ m©m,0^-m (17) 

m=l 

1 P 

c^j = - meo, m Wj-m, (18) 

m=l 

for j > 1 and > 1. The initializations of the recursions for indices j = or k = are described 
above. These are recursive formulas. In the causal case, one would compute ipi t i,ip2,i, • • • , ^p,i, ^1,2, ^2,2, 
etc. Alternative computational patterns could be utilized, noting that ^ only requires knowledge 
of ipi h with £ < j and h < k. The full algorithm for the acf is then: 

1. Use the recursions (|15|) . (|16p . (|17p . and (|18|) to obtain the various moving average random 
field coefficients, to the desired level of approximation. 

2. Compute acfs for the causal, skew, and one-dimensional random fields, via ([8]) and ([9]), etc. 

3. Compute the final acf via (fbi|) . 

When p = 00, this gives the precise mapping of cepstral coefficients to various MA coefficients, and 
ultimately to the autocovariance function. If p < 00, it provides an algorithm for determining auto- 
covariances for a given cepstral mode l . The se formulas are already much more complicated than in 
the time series case (see iPourahmadil ()1984l )). and for higher dimensional fields become intractable. 
When p is small these exact recursions may be less burdensome; otherwise the approximate method 
based on discretizing ([2]) may be preferable. 



3 Model Fitting Methods 

In this section we give additional details on various methods for fitting cepstral random field models, 
and present some tools for refining specified models. Once a model is specified, we can estimate the 
parameters via exact maximum likelihood, Bayesian posterior simulation, an approximate Whittle 
likelihood, or an exact Whittle likelihood (described below). Whereas other methods of estimation 
can be conceived, we focus on these four techniques due to their mixture of being flexible and 
possessing good statistical properties. Most of our treatment holds for general stationary random 
fields, but when a result only holds for the cepstral random field we will make explicit mention of 
this. 
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The first two approaches require the exact Gaussian likelihood, whereas the third approach uses 
an approximation. Once model parameters have been estimated, in the frequentist paradigm we 
may be interested in refining our model, which we might approach by determining whether certain 
cepstral coefficients are significantly different from zero. If they are not, we can easily fix these 
coefficients to zero (it is trivial to do this sort of constrained estimation with the cepstral model) 
and rerun our estimation. 

In this section, we first define Kullback-Leibler (KL) discrepancy, the exact Whittle likelihood, 
and the quasi- maximum likelihood estimate (QMLE). Then we proceed to describe Central Limit 
Theorems for the maximum likelihood estimates (MLEs) and QMLEs, which are novel results in 
the random field literature, under an expanding domain asymptotic theory. These results, proved 
for fairly general linear random fields with regression effects, are then specialized to the case of the 
cepstral field, and the backward deletion mo del refin e ment procedure is described. 

While the MLE procedure is discussed in ICressid (|1993l ) and other works, there are few precise 
mathematical re sults that provide Central L imit Theorems for estimates. In the prior literature, 
we observe that Mardia and Marshall ( 19841 ) do indeed establish efficiency of the MLE when the 
random field is Gaussian, under a condition on the trace of the covariance matrix. In fact, we provide 
(see Section 2]) s ufficient conditions under wh ich our Lem ma [1 yields the validity of condition (iii) 
of Theorem 2 of iMardia and Marshall! ()1984l ) . Although I Sold (|1986l ) advocates the approximate 



QMLE method in practice, asymptotic results are not proved in that paper. Therefore we believe 
that the theoretical results of Section 2] are novel and useful, filling a gap in the spatial statistics 
literature and allowing a disciplined approach to model building and infer ence. 



There is also an existing literature on Discrete FTs for random fields (see lBandvopadhvav and Lahiri 

201ol ) under various asymptotic schedules, including in-filling asymptotics, expanding (template- 



based) asymptotics, and mixtures of both. Whereas it would be intriguing to extend our theoretical 
results to these frameworks, we have not attempted to prove such results in this paper. For one 
thing, in-fill asymptotics require a continuous-parameter random field, and the cepstral approach 
requires non-trivial modifications from the lattice situation (the complex exponentials will not span 
the space of log spectra). For another, the results given here are sufficient to guide inference for 
regular lattice data, so that only exotic shapes and irregular sampling are excluded. 

Now we proceed to discus s spatial modeling, adapting the vector time series treatment in 
Taniguchi and Kakizawa Again, we do not assume a cepstral formulation in what follows. 



Since we want to formulate asymptotic results for parameter estimates, we will speak of our data in 
somewhat more idealistic terms for the purposes of formulating precise mathematical statements. 
So we will suppose that our data comes to us in gridded form, corresponding to a N x N matrix 
(with denoting the de-meaned version). This presumes a square domain, which however is 
only necessary for formulating the asymptotic results below. Rectangular domains can be handled 
by taking the smallest circumscribing square, and utilizing the asymptotic results for this super- 
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square. In other words, we might let N be the maximum of the length and width of the rectangle, 
and apply the asymptotic results (this is reasonable if the dimensions are roughly equal) . 

Now Y N and W N can be vectorized into length A 2 vectors Y and W via the so-called lexico- 
graphical rule 

Y k = Y^ s W k = W*, k = N(r - 1) + a. 

Here that Y = vec(Y N '), where vec stands for the vector operation on a matrix. Note that 
i — 1 = k div N and s = k mod N. Also let fx = EY, so that [i k = EY^ = EY^, = [i r>s . In the simplest 
scenario the mean matrix {/J, r ,s} is constant with respect to the indices r, s. More generally, we might 
model the mean through regressor functions defined upon the grid, i.e., fj, TiS = Yle=i PgX^r, s ) f° r 
some specified lattice functions {Xg}^_ v Then 

L L 

fi k = ^2/3 e X e (kdivN +l,kmodN) = ^ft i Xt{k) 
i=\ i=\ 

maps each X# from a lattice function to a function Xg of the natural numbers. The parameters 
Pu(h,'" then enter the regression linearly, and we can express things compactly via fj, = X/3, 
where X is the regression matrix with columns given by the various Xg. 

The spectral density of a mean zero random field W has already been defined in ([1]), and the 
discrete FT of the field is now defined as 

N N N N 

W(A!,A 2 ) = E Wj^e-^e-*** = E E W N {tl _ l)+t2 Z^ Z% 

*1 = 1*2 = 1 h =1*2=1 

for Ai,A2 G [— vr, 7r]. Note that we define this FT over all pairs of frequencies, not just at the 
so-called Fourier frequencies {irj/N}. Also the FT depends on ft through the mean-centering; if we 
center the data Y N by any regression parameter other than the true ft, some bias will be introduced. 
The periodogram will be defined at all frequencies, and is proportional to the squared magnitude 
of the FT: 

Wi,A 2 ) = iV- 2 |W(A 1 ,A 2 )| 2 = ]T lhlM {I)Z h ^Z^. 

\h x \<N ,\h 2 \<N 

Here 7hi,h 2 (-0 i s the sample acf. Note that our notation 7/ ll .h, 2 (I) is consistent with ([2]), i.e., the 
sample autocovariance is the inverse FT of the periodogram Ir. We also will consider an unbiased 
acf estimate given by 

lhlM ~ (N- |/ii|)(A- \h 2 \) 7hlM ^- 
We emphasize that the computation of this periodogram requires a choice of ft, and so is written 
Ir. This can be used to assess the frequency domain information in the random field along any 
row or column ; the periodogram can also be viewed as a crude estimate of the spectral density F 



(Cressie, 



19931 ). 
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Model fitting can be performed and assessed thro ugh t h e Ku llbac k-Leibler (KL) discrepancy, 
just as with time series. Although KL is mentioned in I Sol J ( 1986 ) and Cressie (1993), we provide 
an in-depth treatment here - see Lemma [21 for example. If F and G are two (mean zero) random 
field spectral densities, their KL discrepancy is defined to be 

KL(F, G) = < log F + G/F > . 

This is a convenient mechanism, since KL is convex in F. As f3 parametrizes mean effects, we let 8 
be a parameter vector describing the second order structure. If the true data process has spectrum 
F, and we utilize a model with spectrum Fg, then KL(Fg, F) can be used to assess proximity of 
the model to truth. The convexity of KL guarantees that when the model is correctly specified, 
the true parameter 6 minimizes th e discrepancy. When the mod el is misspecified, the minima 
9 are called pseudo-true values (cf. Taniguchi and Kakizawa . 200ol ). For the cepstral model, the 



parameter vector is 6 = vec[0]. The full parameter vector is written cf>, where 4>' = [6',/3']. 

It is natural to use KL to fit models as well. For this, consider KL(Fg, la) - which is called 
the exact Whittle likelihood - and minimize with respect to 6, which produces by definition the 
estimate Oqmle- Then using ([2]) we obtain the practical expression 

N N 

KL{Fg,I 3 ) = < log Fg > +N~ 2 Yl WNiti-y+taWNfa-y+iz-ysi-^n-taiFf 1 ). 

tl 1*2=1 Sl,S2=l 

This assumes that the correct regression parameters have been specified. In the case that Fg is 
a cepstral spectrum ([3]), the above expression is even easier to compute: < log Fg > = 0o,o and 
^(Fq 1 ) = 7(F_g), i.e., multiply each cepstral coefficient by —1 to obtain the acf of F^ 1 from the 
acf of Fg. 

The KL discrepancy can also be written as 

<logF e >+ l^M^hhtM^i 1 )- 
\hi\<N,\h 2 \<N 

Unfortunately, 7/i 1 ,/i 2 (-0 is biased as an estimate of 7/ ll; / l2 (i ? ), and this has a nontrivial impact for 
spatial data, though not for time series. Essentially, the presence of "corners" in the observed 
data set reduces the number of data points that are separated by a given lag (/ii,/i2)j if either of 
| h\\ or \fi2\ is large, we have a very biased estimatj^]. This effect is more pronounced as dimension 
increases. For this reason, we propose using jh lt h 2 instead of 7h 1 ,/i 2 (-0> because E% 1) ^ 2 = 7/ llj / l2 (F). 
Let us call the modified KL(Fg, Ip) by KL(Fg): 

KL(Fg) =< \ogFg > + J2 % 1 ,h a -yh 1 ,h a { F e 1 )- 

\h-L\<N,\h2\<N 



The impact of corners can be visualized by comparing the volume of a d-dimensional cube with that of an 
inscribed ball; the ratio is n/(2d) for d > 2, which tends to zero as d increases. Thus, corners increasingly dominate 
the region as d increases, which interferes with one's ability to measure correlation as a function of lag. 
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Using this criterion instead will produce asymptotically normal cepstral parameter estimates, and 
therefore is to be preferred. 

If even faster computation of the objective function is desired, we may discretize KL and 
utilize values of F directly, without having to compute the inverse FT 7(i ? e ~ 1 )- The result is the 
approximate Whittle likelihood, denoted KLn, and is obtained by discretizing the integrals in 
KL(i ? e, Ip) with a mesh corresponding to Fourier frequencies, but replacing Ip with the FT of the 
IhiM sequence, denoted by Ip. Note that Ip may not be positive at all frequencies, because the 
unbiased autocovariance estimates is no longer guaranteed to be a positive definite sequence. This 
is the price of adjusting for bias. So the discrepancy is 



KL N (F e 



N~ 




log Fg (it j x N \irj 2 N x ) + 



I^hN-\TTj 2 N- 1 ) 
F e (7rj 1 N-\nj 2 N- 1 ) 



(19) 



which can be minimized with respect to 9. The resulting estimate has asymptotic properties 
identical to the QMLE, and in practice one may use either KL or KLn according to computational 
convenience. It will be convenient to present a notation for this double discrete sum, which is a 
Fourier approximation to < • >, denoted by < • >tv; then KLn{Fq) = < logFg + Ip/Fg > N . 

On the other hand, we can also compute the exact Gaussian likelihood for the field. Let the 
covariance matrix of be denoted ^(F), which is defined via = EVFVF'; the resulting 

block- Toeplitz structure of this matrix is analyzed in Section |H The entries of this matrix can be 
determined from F via the algorithms of Section [21 along with careful bookkeeping. A model for 
the data involves a spectrum Fg - let the associated block- Toeplitz covariance matrix be denoted 
S(-Fe) - which is hoped to be a suitable approximation to 'E(F). Then the log Gaussian likelihood 
is equal (up to constants) to 



£(9,/3) = -.5 log \E(F e )\ - .5(Y - Xpj?r\F e )(Y - X(3). 



(20) 



Maximizing this function with respect to 9 yields the MLE 9mle\ a - 
eralized Least Squares (GLS) estimate by standard arguments (see 



so 0mt,f, is given by the Ge n 



Mardia and Marshal] 



(|1984m 



MLE 



-l 



)Y, 



(21) 



9 MLE' \ ^ SmLE' 

which expresses the regression parameter in terms of 9mle- For the computation of (j20|) we must 
calculate the acf corresponding to Fg, which can be done using the algorithms of Section [2J 
Given these notations, we can also re-express the formula for KL more compactly: 



KL(Fg,Ip) = < logFg > +N- 2 (Y - Xp) S(F_ e )(y - X/3) 



(22) 



Contrast this expression with (|20p ; they are similar, the main difference being the replacement of the 
inverse of S(i^) by ^(F^ 1 ) = T,(F_g). We propose using (f2"2"j) to estimate regression parameters, 
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but is to be determined by KL. The formula for the regression QMLE is then 

-i 



Pqmle 



X'^FZ 1 )x 

Sqmle 



X'SOFT 1 )Y, (23) 

"QMLE 



where Oqmle minimizes KL(Fq), which in turn depends upon Pqmle through la. These formulas 
don't apply when we use the approximate Whittle, although the same asymptotic properties will 
hold as for the exact Whittle. 

Most prior literature on random fields seems to utilize approximate Whittle estimation, or 
QMLE, since the objective function is quite simple to write down. However, there is some bias 
in the model parameters QMLEs when 7/ l (i") are used as acf estimates, which precludes a central 
limit theorem and thereby interferes with model-building; this is a dimension-dependent bias that 
is asymptotically irrelevant for time series, but pertinent for spatial data. On the other hand, 
the parameter MLEs don't have this problem, but require more effort to compute. We can use 
the approximate algorithm given by equation (|6|), together with (|20p . to compute the MLEs. The 
QMLEs based on unbiased % acf estimates are faster to compute than MLEs (no matrix inversions 
required), and enjoy the same asymptotic normality and efficiency. 

However, if one prefers a Bayesian estimation of (and (3), it is necessary to compute expC(9, (3), 



which is proportional to the data likelihood p(Y\9,/3). The posterior for 9 is proportiona 



likelihood times the prior, and one can use MCMC methods to approximate p(9\Y) (jGeweke 



to the 



20051 ). 



Recall that the mean of this distribution, which is the conditional expectation of 9 given Y, is called 
the posterior mean, and will be denoted Ob- 

Note that equations (|2ip and Q23|) can each be used in an iterative estimation scheme. To 
determine the MLE, minimize (|20|) to obtain an estimate of for a given f3 computed via (|2ip: 
then update (3 by plugging into ([2~T]) . and iterate. For the QMLE, demean the data by computing 
Y — X/3 and determining la, for a given /3, and then minimize either KL or KLn to obtain 
estimates (either exact or approximate); then update (3 by plugging into (|23p and iterate. From 
now on, we refer to these estimates as the exact /approximate QMLEs (if using biased acf estimates 
7^(7), only consistency holds, and not asymptotic normality). 

We can now provide a description of the asymptotics for the various estimates; a rigorous 
treatment is given in Section HI with formal statements of sufficient conditions and auxiliary results. 
Firstly, the Bayesian estimates Ob and (3b are consistent when the data is a Gaussian random field 
that satisfies suitable regularity conditions (Theorem [2]). Secondly, in the Frequentist case we 
can say a bit more about the asymptotic distribution. Recall that the full sample size is N 2 , so 
a Central Limit Theorem result requires scaling by TV = V N 2 . Let the Hessian of the KL be 
denoted H(0) = W'KL(Fq,F), which will be invertible at the unique pseudo-true value by 
assumption. Then the exact QMLE, approximate QMLE, and MLE for are all consistent, and 
are also asymptotically normal at rate N with mean and variance .ff _1 (#)y(#)ii^((9), where f 
denotes inverse transpose and V(0) = 2 < F 2 VF fl _1 V'F fl _1 >. (This assumes that the fourth-order 
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cumulants are zero; otherwise a more complicated expression for V results, involving the fourth- 
order spectral density.) The estimates of the regression parameters are asymptotically normal and 
independent of the 9 estimates (when the third-order cumulants are zero), for all three types of 
estimates. 

These theoretical results can be used to refine models. Typically, one uses these types of asymp- 



totic results under the null hypothesis that the model is correctly spe cified, so that 6 is t 



par ameter and V = 2 < V l og Fq ■ V log Fq >, which equals twice H. See iMcElrov and Holan 



and 



l e true 

pood ) 



McElrov and Findlevl (|2010i ) for more exposition on model misspecification in the frequency 



domain. Thus the asymptotic variance is twice the inverse Hessian, which has a particularly elegant 
form for the cepstral model. The gradient of the log spectrum is in this case just the various Z\_ 
or so that as in the time series case the Hessian equals twice the identity matrix (because of 
mirror reflectional symmetry in 0, there is a doubling that occurs). Thus there is no redundancy 
in the information conveyed in the parameter estimates, a type of "maximal efficiency" of the cep- 
stral model. Hence the asymptotic variance is just 4/V -2 times the identity matrix, which makes 
computation of the standard errors trivial. 

In terms of model-building with cepstral random fields, one procedure is the following: postu- 
late a low order cepstral field model (e.g., order p = 1) and jointly test for whether any coefficients 
(estimated via MLE or QMLE) are zero. We might consider expanding the model - in the direc- 
tion of one spatial axis or another as appropriate - if coefficients are significantly different from 
zero. This type of forward addition strategy would stop once all additional coefficients are negli- 
gible. Alternatively, one could start with a somewhat larger cepstral model, and iteratively delete 
insignificant coefficients. 

Gaussian Likelihood Ratio test stat i stics can be utilized for nested cepstral models, along the 
lines given in iTaniguchi and Kakizawa ( 2000 ) - which ultimately just depend on the asymptotic 
normality of the parameter estimates - in order to handle batches of parameters concurrently. 
Model selection and assessment can also be assisted by examination of spatial residuals, which are 
defined by applying the inverse square root of the estimated data covariance matrix S(i^) to the 
vectorized centered data W - the result is a vectorized residual sequence, which should behave like 
white noise if the model has extracted all correlation structure. Note that examining whiteness of 
the vectorized residuals is equivalent to looking at all spatial correlations of the spatial residuals 
defined by undoing the vec operation. 

Extensions to missing data, imputation, and signal extraction are possible, as discussed below. 
Suppose that there is missing data in our sample of the field. We can always express this situation 
as Z = JY, where Z is now our observed data vector of dimension iV 2 — T, where T is the 
number of missing observations in the field, and Y is the vectorized random field, as described 
above. J is a selection matrix with ./V 2 — T rows and N 2 columns, which eliminates any entries 
of Y — corresponding to omitted observations in Y - for which no measurements exists. Then the 
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covariance matrix of Z is JH(F)J' , which is easily computed from any given parameter specification. 
The quadratic form in our Gaussian likelihood is then (Z — J /j,)' [JTi(F) J'] [Z — J/x); a matrix of 
this form is always invertible. It is obtained by crossing out rows and corresponding columns of 
E(-F), and contracting the result. 

Signal extraction for random fields can be handled as follows. We suppose that our random 
field Y consists additively of signal and noise, i.e., 

Y = § + N. 

The signal random field § and noise random field N may each be given by cepstral field models, but 
are assumed to be completely independent of one another. Each may also have their own regression 
mean effects. Let the cepstral coefficient grids of each be denoted 0§ and 0n- We can compute 
the acf for each component via the methods outlined in the previous section, obtaining 7§ and tn- 
In turn we can determine the vectorized versions of signal and noise, denoted by S and N, with 
covariance matrices Eg and S^v- With Y = S + N equal to the vectorized data, the quadratic form 
in the Gaussian likelihood is (Y — fJ.)'(X,s + T,n)~ 1 (Y — fi). So this is easy to calculate given the 
parameters 0§ and Opj. The distribution of the signal conditional on the data is Gaussian with 
mean and variance 

E[S] + + Z N y\Y - fj) S s - + Stv)" 1 ^- 

Of course, the conditional mean is for the vectorized field 5; in order to get our estimate of §, we 
reverse the vectorization mapping. Also, if there is missing data and Z = JY is our observation 
vector, the likelihood uses the quadratic form [Z — J fi)' (JTisJ 1 + JY>nJ')~ 1 (Z — Jfi), and the 
conditional mean will now be 

JE[S\ + Z s J'(J^sJ' + J^nJ'T 1 {Z - Jfi). 

The mean K[S] is determined in practice by assigning some of the fixed effects in X to the signal. 
For example, if the signal is defined to be a smooth "trend" effect and the noise is idiosyncratic 
noise, then all mean effects might be assigned to the signal so that E[S] = [i. 



4 Theory of Inference 



This section provides rigorous mathematical results regarding the inference problems delineated 
in Section [3l We do not assume a cepstral random field process, retaining greater generality, but 
assume a fair amount of regularity on the higher moments of the field through the Brillinger-type 
cumulant conditions (|Brillingerl . l200ll ) . 



We begin with an important Lemma that extends Lemma 4.1.2 of iTaniguchi and Kakizawa 



(|2000l ) to the spatial context. First we define a block- Toeplitz matrix T,(F) associated with spectral 
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density F to be N 2 x iV 2 -dimensional with jfcth block given by the N x ^-dimensional matrix 
T,(Fj-k), which is defined as follows. If we integrate over the second variable of F we obtain a 
function of the first frequency: 

i ? h(Ai) = ^ j* F(X 1 ,X 2 )e ihX2 dX 2 . 
Then is the matrix of inverse FTs of Fh, namely with jkth. entry given by 

Ihj-k(F) = ^f f F(X 1} X 2 y hX * dX 2 e^~ k ^ dX 1 = < FZ^Z^ > . 

Based on how we have defined and W, it follows that S(-F) = E[WW'], where F corresponds 
to the Data Generating Process (DGP). That is, lexicographical ordering of a stationary field 
produces this structure in the covariance matrix. Also let T denote the set of admissible spectra 
for two-dimensional random fields, defined as follows. For any spatial autocovariance function 
{jh,k}, consider the sums = J2k l fc ll7h,Jfc|> S k = 52 h Hl7/i,fc|, and S* = Y^k,h \ k \\ h \\lh,k\- Then 

JF= {F : [-tt,7t] 2 -> K+,F(Ai,A 2 ) =^ 7M ^Z|, S+ < 00V/1, < ooVA;, 5* < oo}. 

h,k 

Note that this class excludes spectra with zeroes, which is a minor imposition in practice. 
Lemma 1 Let and be block- Toeplitz matrices with F,G £ J- . Then 

N-Hr{Il iFjG)eTxT E(F)^-\G)} = < U^g^^FG- 1 > +0(N~ 2 ). 
Next, we discuss a Lemma that provides a Central Limit Theorem for weighted averages of the spa - 



tial periodogram, which is a natural extension of Lemma 3.1.1 of lTaniguchi and Kakizawal (|2000l ). 
We take Brillinger's approach, stipulating summability conditions on higher order cumulants of 
the spatial field. Let us denote an integer- valued bi-variate index by t € Z 2 , which has integer 
coordinates (ti,t 2 ). Then a collection of spatial variables can be written {W t (i> , W t (2) , • • • }• The 
strict stationarity condition stipulates that joint distributions of such variables only depend upon 
differences between indices: — t^ = (tj — ,t 2 — t^ ), etc. If we sum a function with 
respect to t € Z 2 , the notation refers to a double sum over t\ and t 2 . A similar notation is used for 
frequencies A S [— vr,7r] 2 , in that A = (Ai, A2). 

The next result presumes that spatial data is sampled from a true spatial field with spectrum 
F, and that we have a collection of continuous weighting functions Gj : [— tt, it] 2 h-> M + . The 
second-order cumulant function of the spatial field is the autocovariance function 7^ with I16Z 2 , 
whereas the fourth-order cumulant function is denoted 

7hAe = cum [W tl Wt+fc, W t+fc , W t+ i] . 
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We require absolute summability of these second and fourth order cumulant functions. Then the 
fourth order spectrum is well-defined via 

FF(X, u,€) = ^2 j h>k j exp{-iA ■ h — iw ■ k — i£ ■ £}, 

h,k,£ 

with • denoting the dot product of bi-variate vectors. In order to establish the Lemma, we impose 
more regularity via 

£ (l + W\W\-W\) l7 fc (i), fc P),..^)l <oo, (24) 
hW,h( 2 \-M k) 

where t denotes the product of the components of t. This will be referred to as Condition B k , 
for any k > 1. Finally, recall that the periodogram is computed from a sample of size N 2 . When 
the regressors are correctly specified, we will write (3 for the true parameter. Then i~ denotes the 
periodogram of the data Y correctly adjusted for mean effects; equivalently, it is the periodogram 
of W N . 

Lemma 2 Suppose that Condition B k \2J$ holds for all k > 1, and that Gj for j = 1, 2, • • • , J are 
continuous functions. Then both < Gjl^ > and < Gjl^ > have the same asymptotic behavior, 
and both converge in probability to < GjF >. Moreover, 

N{<G 3 {I~p-F) >} J i=i ^Af(0,V) 

as N — > oo. The covariance matrix V has jkth entry 

— U f G j (X)G k (uj)FF(X,-u,u)d\du;+ < (GjG k (—) + G 3 G k )F 2 > . 
(2ir) J[-tt,tt] 4 

Also, < GIp > and < GI^ >^ both converge in probability to < GF >, but N < Gj(EIp — F) > 
does not converge to zero in probability. 

The last assertion of Lemma means that the asymptotic normality result for < Gj (I-s — F) > 
cannot be established; as pointed out in Guyon (1982), there remains a degree of asymptotic bias 
that precludes the central limit theorem from holding. However, < Gj(I^ — F) > does not suffer 
from this bias problem. Both lemmas are important preliminary results for our main theorems, but 
also are of interest in their own right, extending known time series results to the spatial context. 
Although generalizations to dimensions higher than two seem feasible, the actual mechanics become 
considerably more tricky. Now the key assumptions that we require for our stationary random field 
are the following: 

Al: F6J. 

A2: The spectral density Fg is twice continuously differentiable and uniformly bounded above 
and away from zero, and moreover all components of Fg, Vi 7 ^, VV'i 7 ^ are in T . 
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A3: The Bollinger conditions - given by equation (|24p - hold for all k > 1. 
A4: The pseudo-true value 9 exists uniquely in the interior of the parameter space. 
A5: H{9) = WKL(F e ,F) is invertible at 9. 

Conditions Al, A3, and A5 cannot be verified from data, but some assumptions of this nature 
must be made to obtain asymptotic formulas. Condition A2 will hold for cepstral models (and 
other random field models as well) by the following argument. The coefficients of the causal and 
skew fields will have exponential dec ay in either ind ex argument, by extensions of the classical 
time series argument (see for example iHurvichl (|2002l )) applied to (|15p and (|16p . (The time series 
argument can be directly applied to (|17p and (|18[) as well.) Combining these results using ()14|) . the 
acf of the cepstral field will also have exponential decay so that Fg £ T . Of course, another way to 
verify this condition is to examine the boundedness of partial derivatives of the spectrum; at once 
we see that A2 holds for the cepstral model. 

Although condition A4 may be problematic for certain moving average models (which may have 
complicated constraints on coefficients), the cepstral model uses no constraints on the entries of G, 
i.e., any entry of the matrix can be any real number, independently of all other entries. Euclidean 
space is open, so any pseudo-true value is necessarily contained in the interior. Also, existence 
of a pseudo-true value is guaranteed by convexity of the KL discrepancy. We now state the limit 
theorems for our parameter estimates. For the QMLE estimates, we suppose that they are either 
exact or approximate Whittle estimates defined using the unbiased acf estimates. 

Theorem 1 Assume that conditions Al through A5 hold, and that the third and fourth order order 
cumulants of the field W are zero. Assume the regressors are correctly specified - where /3 the true 
regression parameter vector - and the regression matrix satisfies (X'X) — > 0, the zero matrix, as 
N — > oo. Then in the case of MLE or the QMLE, both (5 and 9 are jointly asymptotically normal 
and independent, with distributions given by 

AM (o, MZ\9) \x'E(F,)E(F)E(F,)x] MUd) 



N(e-tj ^rtfo,^ 1 {9)V{9)W{9)^ 

M x (9) = [x'Y,(F_ e )X 

H{9) = VV'KL(F e ,F) 

V{9) = 2 < PVF^V'F^ 1 >, 

where f denotes an inverse transpose. 

Remark 1 The condition on the fourth order cumulants can be relaxed, in which case the expres- 
sion for V is more complicated, involving the fourth order spectrum FF. We state our theorem 
with the simpler result, as this form tends to be used in practice in the time series case. If the 
third order cumulants are nonzero, there can be asymptotic correlation between (3 and 9. 
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Remark 2 Application of the same techniques in the case of a one-dimensional random field, or 
time series, yields asymptotic normality of regression and time series parameters under Brillinger's 
conditions. To our k nowledge, the only other results of this flavor for time series with regression 
effects is the work of iPiercd (|197ll ). which focuses on ARIMA models but allows for skewed non- 



Gaussian distributions. 

For the result on Bayesian estimation, we will assume that the model is correctly specified, 
which is fairly typical in the literature on Bayesian consistency. As the data likelihood is Gaussian, 
it is consistent to assume that the data is also Gaussian. The model must be identifiable, i.e., 
Fg 1 = Fg 2 implies Q\ = 62, which helps ensure asymptotic concentration of the likelihood. We 
assume the parameters belong to some compact subset of Euclidean space, and the true parameter 
vector lies in the interior. This assumption can often be accomplished by prior transformation (and 
is easily accomplished for the cepstral coefficients in the cepstral model). 

Theorem 2 Assume that the data process is Gaussian and satisfies A2, and the model is correctly 

specified and is identifiable. Suppose that the true parameters lie in the interior of the compact 

parameter space. Also suppose that the regressors are correctly specified - where j3 is the true 

regression parameter vector - and the regression matrix satisfies N~ 2 X'T I ~ 1 (F@)X — > M{6), a 

positive definite matrix that has two-norm uniformly bounded in 9 away from zero and infinity. 

^ P ~ ^ p ~ 
Then j3p — > p and 6p — > 0. 

It is worth comparing the conditions of the two theorems. In Theorem [2] the assumption of a 
correct model makes Al automatic, and the Gaussian assumption makes A3 automatic. Further- 
more, the assumption in Theorem [2] on the parameters - together with the assumption of a correct 
model - automatically entails A4 as well. Theorem [1] also assumes A5, which is chiefly needed to 
establish asymptotic normality of the frequentist estimates. The Bayesian result requires a slightly 
stronger assumption on the regression matrix in order to get asymptotic concentration of the like- 
lihood. For example, if we seek to estimate a constant mean by taking X to be a column vector 
of all ones, then M{9) exists and is just the scalar F_g(0, 0); this will be bounded away from zero 
and infinity in the cepstral model if all the cepstral coefficients are restricted to a range of values. 



5 Conclusion 

The general modeling approach and asymptotic theory we propose extends the spatial random 
field literature in several directions. By providing recursive formulas for calculating autocovari- 
ances, from a given cepstral random field model, we have facilitated usage of these models in a 
both Bayesian and likelihood settings. This is extremely notable as many models suffer from a 
constrained parameter space, whereas the cepstral random field model imposes no constraints on 
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the parameter values. More specifically, the autocovariance matrix obtained from our approach is 
guaranteed to be positive definite. 

Further, we provide extensions to accommodate missing data and to conduct signal extraction. 
In doing so we provide a comprehensive approach for modeling spatial lattice data. Also, we provide 
several approximate methods for speeding up computation. Importantly, this makes our approach 
increasingly practicable in high dimensional settings. 

In addition, we establish results on consistency and asymptotic normality. This provides a rig- 
orous platform for conducting model selection and statistical inference. The asymptotic results are 
proven generally and can b e viewed as an independent contribution to the r andom fi eld li terature, 
expanding on the results of iMardia and Marshalll ()1984l ) and others, such as iGuvonl (|1982j ). It also 
expands known asymptotic results for time series estimated with regression effects, as alluded to 
in Remark 2. 

Owing to the computational advantages and flexibility of the cepstral random field model, cou- 
pled with the asymptotic results presented here, we believe the model provides a natural alternative 
to those currently used in practice. The primary contribution of this paper resides in firmly estab- 
lishing the theoretical aspects of this model. A topic of future research is to conduct wide-spread 
simulation studies and apply the models to a diverse set of applications. These studies are expected 
to further demonstrate the utility and flexibility of the cepstral random field model. 
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Appendix 

Derivation of ©. (^'[G^) equals 

I J rs 

{p + l- j)(M + l-r) 



2p+l 

exp < 

j,k=i 
v 



M 



E ^ 

',h=-p 



exp < —7TI 



(M+l-r) 
~M 



®k-p-i, P +i-j exp <^ iri 



(M + l-s) 
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which produces the stated result using ([3]). 



Proof of Lemma [T], For notational convenience in the proofs, we will consider the frequencies in 
the interval [0, 27r] rather than [— 7r,7r]. So let the Fourier frequencies be denoted \ S) n = 2irsN~ 1 , 
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and define ^y h k (F) = N~ 2 X^fLi YltLi F(X s ,n, ^t,Jv) exp{i/iA S) 7v + ik\t t N}, which is a discrete ap- 
proximation to the relation jh,k(P) = < Fexp{ih ■ +ik-} >. Fixing h, we can consider to be 
a N x N matrix with jkth entry Jhj-kiP)- The notation suggests that this is an approximation 
to S(-P\) defined above. By univariate results (jTaniguchi and Kakizawa . 2000 . p. 491) we can write 

E(F h ) = U* N D N (F h )U N (A.l) 

with [C7jv]jfc = N _1 ' 2 exp{ij'Afe ) jv} an d D^{Fh) defined to be a diagonal matrix with £,£th entry 
given by A^ 1 J2^=i P(X S ,N, A^jv) exp{i/iA S) jv}. The * denotes the complex conjugate transpose. 

Next, we construct E(.F) as an approximation to E(F), consisting of Toeplitz blocks 
in the jkth slot. Since the Un matrices in (jA.ip are common to each Y,(Fh), we obtain 

E(F) = diag(Eft) [ J Djv( J F i - fe )]f fc=1 diag(C/ Ar ). 

Here [Djv^-.j.)].^ denotes a block- Toeplitz matrix consisting of the diagonal matrices D^{Fj_k) 
in each jkth block. This block- Toeplitz matrix (with diagonal blocks) can be reformulated as 
a block-diagonal matrix (with Toeplitz blocks) by application of a permutation matrix P. To 
formalize this, let P be a iV 2 x iV 2 dimensional permutation matrix that for each 1 < k < N 
takes the fcth row of each block matrix and stacks them in order in the kth "mega-row", i.e., rows 
(A; — 1)N + 1 through kN. Then P[D n (Fj_k)}^ k=1 P' is equal to a block-diagonal matrix consisting 
of Toeplitz matrices [D^{Fj-k)] u in the £ih block. We denote these matrices by £[-F(A^at)], which 
have jkth. entry A^ 1 J2^=i P(X S ,N, A^jv) exp{i(j — fc)A Si j\r}- It then follows that 

F = diag(t^) P' diag{diag{F(A J - N , A^v)}f =1 }f =1 P dmg(U N ). 

Now the middle matrix is a doubly-diagonal matrix consisting of all the values of F at Fourier 
frequencies, in lexicographical order. Let us abbreviate this matrix by Dn(F). Then this approxi- 
mation Ti(F) is particularly convenient, since 

S(F)S _1 (G) = diag(E^) P> D N {F/G) P diag(C/^) = E(F/G) 

for bounded spectra. At once we find that the trace of this expression is 

N N 

f(\ s ,n, xt,N)/ G(x St N, x^n), 

s=l 4=1 

which when divided by N 2 will converge to < F/G >. The extension to multiple Fs and Gs uses 
similar arguments. So the proof of the lemma depends on establishing that we can swap E(F) for 
S(F). 

Consider the base case of the trace of S(i ? )S~ 1 (G), since the same techniques can be generalized 
to multiple terms. Then 

N^trmF)^- 1 ^)} - N~ 2 tr{Z(F)Z~\G)} 

= N- 2 tr{[Z{F) - E(F)] E-^G)} + iV- 2 tr{E(F)S~ 1 (G ! ) [S(G) - E(G)] E" 1 ^)}. 
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Note that by Lemma A.l of iDahlhaus and Wefelmeverl (|1996l ) \tr(AB)\ < \A\\B\ where | • | denotes 
the Frobenius norm. Also we use the fact that the Frobenius norm of a product can be broken 
down in terms of 2- norms || • ||, so that 

lir^FOE-^G) [2(G) - E(G)] S _1 (G)}| < ||S(F)S- 1 (G)|| • |S(G) - S(G)| • ||S _1 (G)||. 

With these results, it will suffice to show the boundedness of 

|S(F) - S(F)| 2 = £ | 7fcj -_ fc 

or equ ivalently that ^ • fc ft It/i. ; j— fc ~ 7fo,j-fc| is bounded. The analysis requires a generalization 
of the IWahbal (|1968l ) result from time series to spatial random fields. First note that 7^ k = 



Ylm 1 1h+mN,k+iN by properties of Fourier frequencies, and thus 

lh,j-k - lh,j-k = Y2 Ih+mNj-k+tN- 
m,£^(0,0) 

Then the sum of the absolute difference can be bounded as follows: 

Y \lh,j-k-%,j-k\ 
j,k,h 

< Yj I Yj lh+mN,j-k+lN\ 
j,k,h m,£^(0,0) 

< ^2( N ~\i\) Y Y hh+mN,i+M\ 
\i\<N \h\<N m,ty(0,0) 

00 00 

= EEE ^2 (n - \i\)\j h+mN7i+m \ + J2 Y ^(^-KDbM+wi 

|m|=l i \h\<N\i\<N \£\=l\h\<N \i\<N 

00 

= 2 Y Y Y Y ( N ~ liDbh+mNJ+wl + Y Y Y ( N ~ ftl) (hh+N,i+w\ + \lh-N,i+w\ 
\m\=2 I \h\<N\i\<N £ \h\<N\i\<N 

00 

+ 2 Y Y Y^ N -\ i \^h,i+m\+ Y Y( N -w^h,i +N \ + hh,i-N\) 

\£\=2 \h\<N\i\<N \h\<N \i\<N 

^ 2 Y Y Y( N ~ Whrnj+Wl + 2Y Y( N ~ 

\m\>N I \i\<N h,l \i\<N 

+ 2N YY hhA+ 2 Y Y wind 

\£\>N \h\<N h \i\<N 

^ 2C Y Y i*ii7m,ii+2jv Y Y i^vi 

|m|>iV|t|<iV \m\>N \£\>N 

+ 2C Y Y \i\hh,i\ + ™Y Y i^vi 

h \i\<N h \1\>N 

+ 2N YY \^a + 2 Y Y 1*1 w> 

|£|>iV |/i|>AT h \i\<N 
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where C > denotes a generic constant. At this point we utilize our definition of J- to conclude 
that the entire sum is bounded in N. As a result, the trace approximation has error of order N~ 2 , 
which is also the approximation error in passing to the Riemann integral. □ 



Proof of Lemma [2l First note that I-s is the periodogram of the random field W . To establish 
joint convergence, we use the Cramer- Wold device and linearity of the forms; so without loss of 
generality consider the problem with a single weighting function G. First note that the expectation 
of the spatial periodogram is 



EI^(A) = ^E 7t _ s (/) exp{-iA • (t - s)} 



t.s 



7hexp{-«A • h}(l 

\hi\<N,\h 2 \<N 



M/N^l-M/N), 
(A.2) 

where the first sums h ave integer ind ices ranging from 1 to N. Thus under (|24p we have E/^(A) — 
-F(A) = 0(N~ 1 ) as in iGuvonl ( 19821 ) (note that this is weaker than the 0(N~ 2 ) result analogous 
to the time series case). Further results, whic h are needed to prove the Lemma's claims, involve 



generalizations of Theorems 4.3.1 and 4.3.2 of iBrillingerl (|200ll ) to the spatial field case. Consider 
iV -1 G(\)Ip(\) (where the sum is a double sum over pairs of Fourier frequencies), which has 
mean AT" 1 J2x G ( A ) E [^( A )] ~ N <GF>. That is, N~ 2 J2\ G(A)E[i~(A)] -> < GF >, which does 
not vanish when we normalize by multiplying by N, as happens in the one-dimensional (i.e., time 
series) case. This arises from the fact that in ()A.2p 

(1 - h x /N){l - h 2 /N) = 1 - (h x + h 2 )/N + h^/N 2 , 

and the 0(N~ l ) terms do not vanish; this establishes the last claim of the Lemma. However, the 
results change when we redefine the periodogram using unbiased sample acfs, i.e., = 7/ l (i ? ). 
Also 

E1~(A) - F(X) = ^(F) exp{-a • h}, 

\h!\>NLl\h2\>N 

so that N- 1 Y,xG{X)(E[T^(X)]-F(X)) by condition B x . The variance of iV" 1 J] A G(A)/^(A) is 
given by an expansion involving cumulants. Let t?^ m and ip£ m for £, m = 1,2 be defined as various 
signed combinations of frequencies Ai, A2, uji, w 2 , given as follows: 



(-Ai,-A 2 ) (Ai,A 2 ) 
(-wi,-w 2 ) (wi,w 2 ) 



c#ll,</>ll) (#12, ^12) 
(#21, ^21) (#22,</?22) 



For the variance calculations, we can work with I^(X) instead of I-s(X), using condition B\ to control 
the error. The variance is then 

iV- 2 ^G(A)G( W )cum (i~(A), 1^ 

= AT-2 Ar -4^^ G(A)G(a;)cum (w($ em ,cp em ) : (i,m) £ vi,--- ,W(#lm,<Ptm) ■ (t,™,) E u q 
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where the sum is over all indecomposable partitions v of the table 

(1,1) (1,2) 
(2,1) (2,2) 

Therefore the only proper partitions that need be considered are {(1, 1), (2, 1)} (J{(1, 2), (2, 2)} and 
{(1, 1), (2, 2)} (J{(1, 2), (2, 1)}. We proceed to calculate the relevant asymptotic cumulants: 

cum (W(-Ai, -A 2 ), W(-wi, -u) 2 j) 
= 7t 2 _ tliS2 _ Sl exp{i (Aiii + A 2 si + uit 2 + uj 2 s 2 )} 

t,s 

CN- hl N-h 2 \ 
y~] exp{i(Ai + ijJi)t\ + i(\ 2 + u)2)s\} , 
tl = l 81 = 1 / 

which is order N 2 when Ai = — uj\ and A 2 = — uj 2 , but is lower order otherwise. In the former case, 
the cumulant is asymptotic to N 2 F(— — uj 2 ). The next cumulant is that for W(Ai,A 2 ) times 
W(cji,u> 2 ), but the analysis is just like the previous case (only with minus signs deleted), and the 
asymptotic in the case Ai = —uj\ and A 2 = —to 2 is N 2 F '(101,102)- Next, we have 



cum (W(-Ai,-A 2 ),W(u;i,a; 2 ) 
= 7 t2 _ tl)S2 _ Sl exp{i (Xiti + A 2 si - uit 2 - u 2 s 2 )} 

t,3 

/ N—hi N-h 2 \ 

= ^2lhexp{-icoihi - iu 2 h 2 ] I ^ ^ exp{z(Ai - ui)ti + i(X 2 - u 2 )si} J , 

h \tl = l Sl = l / 

which is order iV 2 when Ai = lo± and A 2 = uj 2 , but is lower order otherwise, and thus is asymptotic 
to N 2 F(ui\, uj 2 ). Finally, the cumulant of W(Ai,A 2 ) and W(— ui,— lo 2 ) is similarly computed, with 
the result of N 2 F(— oji, —ui 2 ). 

There is also the cumulant involving all four discrete Fourier Transforms, namely 



cum(W(-Ai,-A 2 ), W(Ai,A 2 ), W(-wi,-w 2 ), W(wi,w 2 ) 
= ^2 cum (W t , W s , W m , W n ) exp{i (Xiti + X 2 t 2 - A1S1 - A 2 s 2 + u\m\ + w 2 m 2 - wini - uj 2 n 2 )} 

t,s,m,n 

~ ^ 2 eX P{^ (-^1^1 ~ ^2^2 + Wl^i + W 2 /C 2 - CJi^i - W 2 £ 2 )} 

= iV 2 FF(Ai,A 2 ,-wi,-a; 2 ,a;i,w 2 ). 
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Then consolidating, the variance of N 1 Y2\ G(X)Ip(X) * s asymptotic to 
N~ 4 ^ G(X)G(co)FF(X 1 , A 2 , -u> u -u 2 , "i, w 2 ) 



A u> 

iV- 2 ^(G(A)G(-A) + G 2 (A))F 2 (A) 

A 

-4 



(27T)" 



[-7r,7r] 4 



G(\)G(u)FF(\, -u, u)dXdu + (2tt) 



(G(A)G(-A) + G 2 (A)) F 2 (A) dX. 



This is the main calculation, but some details remain. The above expression provides the asymptotic 
variance, and once generalized to a joint result over several functions Gj yields the stated matrix V. 
Our results have been derived for iV _1 < Gj(I^ — F) > ; for N -1 < Gj(I^ — F) > we can proceed 
as in Theorem 5.10.2 of Brillinger (2001), using the boundedness of the functions Gj and the above 
discrete Fourier Transform results to bound the difference of Ip(2irbN , 2irsN~ 1 ) and Ig(X) over 
the region {A G ftirtN- 1 , 2vr(t + ljN' 1 } x [27r*JV -1 , 2vr(s + ljAr 1 ]} by O p (A^~ 1 ). 

Finally, higher order moments of A^ -1 < G{I-s — F) > will be asymptotically negligible due to 
the higher order (k > 3 in Condition B/.) cumulant conditions, which is proved along the lines of 
the combinatorial analysis of Theorem 1 in iMcElrov and Holanl (|2009l ) . This asymptotic structure 
for the cumulants indicates a limiting normal distribution, as stated. □ 



Proof of Theorem Q3 We begin with the analysis of (3. From (|23p and ([21 p we obtain 



X'E(Fp )X 

Bqmle 



X'^FZ 1 )[Y-X~P 



vqmle 



x'ttMfh )x] x'e -1 ^ )(y-xb 



Pqmle — P 
Pmle — P 



So the regression errors in both cases are linear functions of W; consistency follows from A2 and 
the condition that (X'X) tends to the zero matrix. By the cumulant conditions that we assume 
(A3), the random field satisfies a central limit theorem. Moreover, we can replace the estimates 
8 by their limits 9 (their consistency is proved below) because the error in doing so is of lesser order 
by Lemma [TJ Moreover by the same Lemma, A r_1 X'S -1 (Fg)W has the same asymptotics as 
N^ 1 X'Y,(F_g)W , which is asymptotically normal with mean zero and covariance 

N- 2 X'Y,{Fz)?,(F)T,(F_q)X. 



This result follows from the argument used in Theorem 2 of iMardia and Marshalll (jl984l ). together 
with A2 and the assumption that (X'X) tends to a zero matrix as N — > oo (plus the central 
limit theorem for by A3, as our data process may be non-Gaussian). Now applying the matrix 
Mx(&) to the asymptotic normality result yields the limit theorems for the regression QMLE and 
MLE. This argument holds for the exact Whittle estimates, but as in the proof of Lemma [21 can 
be extended to the case of approximate Whittle estimates as well (see next paragraph). Below, 
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we will show that the MLE and QMLE for can be expressed as a quadratic form in the data 
(asymptotically), and hence is uncorrelated with the regression estimates when the third order 
cumulants of the data process are zero. 

Now we consider the asymptotics for the parameter 9. First we examine the exact and ap- 
proximate QMLEs. One may d evelop an error expression along the lines given for Lemma 3.1.1 of 



Taniguchi and Kakizawal (|2000l ). with 9 denoting the exact QMLE and 9m the approximate QMLE: 



P (1) 
o P (l) 



VV'KL(F ? , F) 
W'KL N (Fg,F) 



< VF~ 1 h-F) > 



< VFZ 
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F) > 



N 



which are developed by Taylor Series expansion in each case. In order to write down this expansion, 
we must replace I-s by I-*, which uses the consistency of j3. Now in the proof of Lemma [2] it is 
established that KL and KLn differ in their stochastic terms of order Op(iV _1 ), whereas the log 
determinant terms differ by 0(N~ 2 ) via the definition of the Riemann integral. These types of 
results also hold for the difference of the gradients of the KL functions, since Vi^ -1 is continuous 
by assumption. Then apply Lemma[2]to the above convergences, as V-Fg -1 = —VFg ■ Fq~~ 2 , which is 
continuous by assumption A2. Because the fourth order cumulants are zero, V has the form stated 
in the theorem. 

Next consider the MLE case, where the scaled log Gaussian likelihood is C given in (|20p . The 
method of proof follows the treatment in Section 4.1 of lTaniguchi and Kakizawal (|2000l ). but with 
their Lemma 4.1.2 replaced by our Lemm a [TJ Also note that many of the calculations are identical 
with those of iMardia and Marshalll (| 19841 )- though our regularity conditions together with Lemma 



[JJ essentially verify condition (hi) of their Theorem 2 (also those authors assume a Gaussian spatial 
process, whereas we allow small departures from normality). First 



= Vlog C{9 M le) 



V log C{9) + VV log C{9)(9 -9) + -J{9\ 9 



(A.3) 



for some 6* such that ||0* — 9\\ < \\t 
that linearly depends on 9 — ; if 9 



9||, with || • || the Euclidean norm. Here J is a 3-tensor 
) = Op(iV~ 1 ), then this term will be P (N~ Z ). As in 
Taniguchi and Kakizawal (|200ol . p. 180) set U N = N(9 - 9) and Z N (9) = N~ X V logL(0), which 



equals X'^- 1 (F e )V^(F e )i:- 1 (F e )X/(2N) - tr(S- 1 (F e )VS( J F e ))/(2A^). Here VS(F e ) is a short- 
hand for an array of matrices, each of which is the derivative of T,(F$) with respect to one of the 9j. 
Then from (|A.3|) we have Un = — [N~ 2 Hn(9)] Zn(9) plus terms of order iV~ 2 , with Hn equal to 
the Hessian of C. The mean of Z^{9) is given by 

^ tr {e-^^JVE^JS- 1 ^) - E(F e )] } = O^- 1 ) + ^ < VF e F~ 



F - Fa 



>, 



where the second equality follows from Lemma [TJ The final integral is actually equal to the gradient 
of KL(Fg, F), and thus is identically zero when 9 equals the pseudo-true value, no matter whether 
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the model is correctly specified or not. Also since the fourth cumulants are assumed to be zero, 

Var [Z N {6)\ = -L tr (z-\Fg)VZ(Fe)Z-\Fg)Z(F)Z-\Fg)VZ(F e )Z~\F e )Z(F) 

which by Lemma Q] conv erges to the matrix V(9)/ 4. Hi gher order cumulants of Zn are analyzed 
along the lines offered in lTaniguchi and Kakizawa only replacing their use of Lemma 4.1.2 



(e.g., in their equation (4.1.24)) by our Lemma[TJ Hence Zn(9) ==> A/(0, V(9)/4). Also we have 

2d j d k log£(9) = -X'E- 1 (F e )d j E(F e )E- 1 (F e )d k E(F e )E- 1 (F e )X 
-X"E- 1 {F e )d k ^(F e )E- 1 (F e )d j E{F e )^ 1 (F )X 
+ X'E- 1 (F e )d j d k E(F e )E-\F )X 

- tr{E- 1 (F e )d j d k ^(F e ) - E- 1 (Fg)d j E(Fg)E- 1 (Fg)d k E(Fg)} , 

which is twice the Hessian matrix H^(6). The expectation is 

EH N (9) = -tr\E- 1 (Fg)VE(F e )E- 1 (F )VE(F e )a-\F )E(F)j 
+ tr {S-^F^VV'S^^E-^F^E^)} /2 

- tr (E- 1 (F e )VV / E(F,) - E- 1 (F e )VE(F e )E- 1 (F e )V'E(F 9 )} /2, 
and applying Lemma [U we have N~ 2 EHn(9) tends to 

- < Fg 3 VF g VF g F > + < F- 2 VV'F e F > /2- < F e l VV F e > /2+ < Fg 2 VF g VF g > /2, 
which is half the Hessian of KL(Fg,F), i.e., H{9)/2. Also by Lemma[T]we obtain 
Var (A^W'logL^)) ~ ^ < (^V F e V FgF~ 2 - VV'FgF^ 1 ) 2 > 

as N — y oo, and hence N~ 2 H N {9) -A #(0)/2. Then H(0)U N M(0,V(9)), and the stated 
result follows. □ 

Proof of Theorem [2]. All integrals are multi-dimensional. Let || • || denote the Euclidean norm. 
Let p(9\X) denote the posterior for (f> = [0',f3']', which is proportional to the Gaussian likelihood 
p(X\(f>) times p((f>), the prior for the parameter vector. The posterior estimate is the vector quantity 
<j> = J 4>p(4>\X) d(j); we seek to show that it converges in probability to 4>, the true parameter value. 
We further develop the likelihood as 

p(X\<l>) = exp j-^ (log(2vr) + C N (9, /?))} 
C N (9,/3) = N~ 2 log |E(F e )| + N~ 2 (Y - xpj?T x {F g ){y - A/3). 
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We next expand the quadratic form so as to relate it to the true f3 parameter: 

(Y - xp)'v-\F )(y - xp) = (Y- xpj?r x {F ){Y - xp) 

+ 20- p)'x'i:-\Fe){Y -XP) 
+ 0-P)'x'Z-\F )X0-l3) 

The first term on the right hand represents the same quadratic form, but with j3 replacing (3; 
therefore it is a quadratic form in W. The second term is linear in W, so as in the proof of 
Theorem [1] will obey a central limit theorem and is Op(N). The final term is deterministic, and 
under the conditions of the theorem is of order N 2 . As a result, 

C N (6,0)=£ N (6,0) + E N + Q(e,P) (A.4) 

where En is an error term tending to zero in probability and Q(9, P) = (P — P) M(0)((3 — j3). Note 
that by our assumptions on M(0), the quadratic form Q(9,P) is positive whenever (3 ^ (3, for any 
9, and moreover is bounded above by A\\(3 — j3\\ for all 6, where A bounds all the eigenvalues of 
M{9) for all 9. Also A is the lower (positive) bound on the eigenvalues of M(9). 

So for any e > 0, define neighborhoods N e (9) = {9 : \\9 — 9\\ < e} and similarly N e ((3). We can 
then decompose the parameter space (because all parameters are contained in a compact subset of 
Euclidean space) into N e (9) x N e ((3) and its complement. This complement can be broken into two 
overlapping regions, R\ and R2, where R\ = N^{9) and R2 = N£((3). The complements are taken 
in the entire product parameter space, so R\ corresponds to ||0 — 9\\ > e but (3 is unconstrained, 
while R2 corresponds to \\(3 — (3\\ > e but 9 is unconstrained. The intersection region R\ f] R2 is 
characterized by both constraints being true. In order to get disjoint regions, let Ri = R\\(R\ f] R2) 
and i?2 = Ri \ {R\ f] ^2)- Then we decompose the vector parameter error as 

- ~_ lN c (e)xN t 0) (0-?)exp{-^£jv(M)}p(</>)# 



+ 



fe W {-^£ N (9,(3)}p(0) # 
JfiiUfemfiiflfe) (^-^)exp{-^£jv(g,/3)} p(#)d4> 



The integrands <j> — (j> are vectors, and the integrals are interpreted as integrating each component 
and stacking the result. The first summand has Euclidean norm bounded by e with probability 
one. Focusing on the second term, we proceed to develop two key bounds (see 



Liseo et al 



2001 



Holan et al. 



2009, for a similar approach): 

C N (6,P)-£ N (0,p) >C{6) (A.5) 

for all 9 G N£(9) with probability tending to one. Note that this bound will pertain to the region 
R±, as P is not involved. The bounding function C will be shown to be convex. We also will show 

C N (9,P)-C N (9,P)<C(9) (A.6) 
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for all 9 £ Ns(6), for a convex function C and another radius < 5 < e. It must be chosen such that 
5 2 A < e 2 A. In the above error decomposition e > was arbitrary, but it will be chosen sufficiently 
small so as to guarantee that 



sup C(0) + A5 2 < inf_ C(0). 

e&N s (e) eeN£(8) 



(A.7) 



The proofs of (|A.5|) and (1A.6P follow the strategy in Theorem 3.1 of iDahlhausI (| 19891 ). although we 
have recourse to Lemma[T]for the spatial random field case. The expectation of £n(9, /3) — £n(9, P) 
equals 

-iV- 2 log {V-^FoMF-)] + N~ 2 tr (S-^F^S^) - 1^) . 



Dahlhaus 



Of cou rse 1^2 is the identity matrix. Using a Taylor Series expansion, as in Theorem 3.1 of 
(j 19891 ). produces the expectation being equal to 

tr ((1^2 + tA)- 1 A(1 N 2 + tAY 1 A} , (A.8) 

where A = T,~ 1 (Fg)'E(F~j - 1 N 2 and r is some number between and 1; note that r will depend 
upon the entries of A. The above trace expression can be re-written as a sum of traces of block- 
Toeplitz matrices, to which Lemma Q] can be applied. Observing that A = E" 1 (F g )[Y,(F~) - 
and l N 2 +tA = S- 1 (if 1 )[(l - T)E(F e ) + rS(F^)], then (1^2 + tA)~ 1 = S- 1 (G)S(F e ) with G = 
(1 — t)Fq + tFq. The sum of the block- Toeplitz matrices is again the block- Toeplitz matrix of G, 
because the summands are positive functions. The dependency of G on r, 8, and 9 will be suppressed 
in the notation. The trace expression (|A.8j) can then be expanded using these derivations, which 
yields the trace of 

T.-\G)Jl{F- s )Tr 1 (G)Jl{F- s ) 
-E-^GOE^E-^GOE^) 
-E-^S^E-^GOSOFe) 
+ E- 1 (G)E(F e )E- 1 (G)E(F e ). 

For any fixed r, the trace of the above expression, divided by A^ 2 , converges to < G~ 2 (i ? g- — Fg) 2 > 
as N — > oo; therefore the infimum of this quantity over r £ [0, 1] yields a lower bound, which is 
a function of 9. Furthermore, utilizing the bound < F^/Fg < K for some constant K, for all 
frequencies, we have G/Fg < K uniformly (and r becomes irrelevant). Then let 



C{9) = K < {F-jFg 



D 2 > 



So C{9) is clearly non- negative, and is zero only when i*V = Fg. By the identifiability restriction, 
this will only occur when 9 = 9 (the cepstral model class satisfies this condition, due to the basis 
expansion of log spectra implicit in th e mode l 's con struction). C{8) is also concave down, since 



x ^ (x - l) 2 is (cf. Proposition 2.15 of lVajdal (jl98fll )). 
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Moreover, the variance of Cn(9,/3) — Cn(9,/3) equals 2N~ 4 tr[A 2 ] under a Gaussian DGP, and 
again by Lemma [J — after an expansion of A 2 — we obtain 0(N~ 2 ) growth. This establishes (]A.5p 
for any 9. By taking the infimum of C(9) over N£(9), we can ensure a non-zero value as well, 
utilizing the convexity of the function. For the other bound, note that for any 9±, 92 we have 

C N (9 2 J) -C N (9 X J) = N- 2 (9 2 -9 i ytr{X- 1 {F 6 )X(VFe)} + N~ 2 W' (S" 1 ^) - S" 1 ^)) W, 

where 9 is in-between 9\ and 9 2 , using the Mean Value Theorem and the assumed smoothness on 
spectra. The notation E(Vi^e) denotes an array of matrices, indexed by the various derivatives of 
Fq; after the trace has been computed, the resulting vector is multiplied by \9 2 — 9\]' . Using the 
smoothness conditions on spectra, we may apply Lemma 5.5 of 



Dahlhau; 



(|1989l ) and our Lemma Q] 
to bound the absolute value of the above difference by a constant times \\9 2 — #i||, with probability 
tending to one. Letting 9\ be 9 and C_{9) = K\\9 — 9\\ for some constant K > now yields (|A.6|) . 

Because of the condition on M(9), its eigenvalues are uniformly (in 9) bounded between < A 
and A < oo, and the quadratic form satisfies 



Q(0,P) > Ae 2 

for all f3 6 N£(/3) and any 9. Furthermore we also have 

Q(9, (3) < AS 2 



(A.9) 



(A.10) 



for all (3 £ N$(f3) and any 9. Below, it is shown that we must select 5 for a given e such that 
8 2 A < e 2 A. Next, using the convexity of C_ and C, adjusting their constants if necessary, we can 
ensure that (|A.7p holds for some e, making 5 smaller if necessary. Now the second term in the 
parameter error can be rewritten as 



/*iU«aU(*in«a) V ^ 


exp{-f 


C N {9, P) - C N {9, P) + E N + Q(9, (3) 


| p{4>) dcj) 


Jex P {-f 


C N (O,0)- 


C N (9,f3)+E N + Q(9,f3) 


} P(4>) d<t> 



by multiplying top and bottom by the constant £n(9,{3) and utilizing (|A.4|) . Furthermore, the 
denominator (which is a scalar) can be made smaller by restricting the region of integration, which 
only increases the absolute error of each component of the parameter error vector. We will restrict 
the integration to the set Ng(9) x Ng(/3). At this point, we break the analysis up by considering 
the numerator integral (still as a vector quantity) over the various disjoint regions. We present an 
analysis for integration over R\ and R 2 , and note that either analysis will then pertain to Rif]R 2 - 
First consider the numerator restricted to integration over R\, and take the Euclidean norm of 
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this portion of the error vector; then we obtain the upper bound 





<f>-4> 


exp{-^[C(0)]}p(^ 


/7V,(e)x7V a (/3) eX P' 





< 



Jr.! 





( N 2 


<t>-4> 


exp j- — 



with probability tending to one. The second expression follows from the first by taking supremums 
and infinums. The first expression uses the triangle inequality for integrals along with (|A.5p . (|A,6p . 
and (jA.lOp , Note that these bounds hold with probability tending to one, and therefore they also 
hold with probability tending to one when the term En (which tends to zero in probability as 
N — > oo) is inserted. In the numerator, (|A.5[) applies because we are in the set R\, whereas the 
other bounds hold on the set Ng(6) x N$(f3). The overall bound on this term tends to zero as 
N — > oo, because the numerator integral involves the exponential of — A^ 2 times a positive quantity 
- it is positive by (|A.7p . 

Next, we consider the numerator restricted to R2, and mimic the same line of argument, ob- 
taining the upper bound 



R2 



exp{-f [A e 2 ]} 



< 



R 2 





( N 2 


<j>-4> 


exp j- — 



Ae 2 - su VeeNs{?) C{9) - A8 2 ] } p{4>) # 



with probability tending to one, using (|A.9p . Now we need the condition that sup^^^ Q_{9) < 
e 2 A — <5 2 A, which is assured by taking 5 even smaller as needed. Then this term also tends to zero 
as N —7- 00, which concludes the proof. □ 
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